ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols

نویسندگان

  • Tao Li
  • Lizy Kurian John
چکیده

ÐDirectories have been used to maintain cache coherency in shared memory multiprocessors with private caches. The traditional full map directory tracks the exact caching status for each shared memory block and is designed to be efficient and simple. Unfortunately, the inherent directory size explosion makes it unsuitable for large-scale multiprocessors. In this paper, we propose a new directory scheme, dubbed associative full map directory (ADirpNB) which reduces the directory storage requirement. The proposed ADirpNB uses one directory entry to maintain the sharing information for a set of exclusively cached memory blocks in a centralized linked list style. By implementing dynamic cache pointer allocation, reclamation, and replacement hints, ADirpNB can be implemented as aa full map directory with lower directory memory cost.o Our analysis indicates that, on a typical architectural paradigm, ADirpNB reduces memory overhead of a traditional full map directory by up to 70-80 percent. In addition to the low memory overhead, we show that the proposed scheme can be implemented with appropriate protocol modification and hardware addition. Simulation studies indicate that ADirpNB can achieve a competitive performance with the DirpNB. Compared with limited directory schemes, ADirpNB shows more stable and robust performance results on applications across a spectrum of memory sharing and access patterns due to the elimination of directory overflows. We believe that ADirpNB can be employed as a design alternative of full map directory for moderately large-scale and fine-grain shared memory multiprocessors. Index TermsÐCache coherence, directory protocols, shared memory multiprocessors, computer architecture.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Evaluation of Link-Based Cache Coherence Schemes

Large-scale shared-memory multiprocessors rely on private coherent caches by using directory-based protocols. Directory-based protocols preserve network bandwidth by reducing the number of consistency actions. A critical issue becomes how they maintain state information about the set of caches and how they reduce read and write latencies. These tradeo s are studied in this paper. We study two l...

متن کامل

An Efficient Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors

{ This paper presents a new tree-based cache coherence protocol which is a hybrid of the limited directory and the linked list schemes. By utilizing a limited number of pointers in the directory, the proposed protocol connects the nodes caching a shared block in a tree fashion. In addition to the low communication overhead, the proposed scheme also contains the advantages of the existing bit-ma...

متن کامل

Processor-Directed Cache Coherence Mechanism – A Performance Study

Cache coherent multiprocessor architecture is widely used in the recent multi-core systems, embedded systems and massively parallel processors. With the ever increasing performance gap between processor and memory, there is a requirement for an optimal cache coherence mechanism in a cache coherent multiprocessor. The conventional directory based cache coherence scheme used in large scale multip...

متن کامل

An Efficient Tree Cache Coherence Protocol for Distributed Shared Memory Multiprocessors

ÐDirectory schemes have long been used to solve the cache coherence problem for large scale shared memory multiprocessors. In addition, tree-based protocols have been employed to reduce the directory size and the invalidation latency for a large degree of data sharing in the system. However, the existing tree-based protocols involve a very high communication overhead for maintaining a balanced ...

متن کامل

Directoryless shared memory architecture using thread migration and remote access

Distributed directory cache coherence protocols for current many-core CMPs are not only difficult and error-prone to implement and verify, but also provide suboptimal performance when a thread requires access to large amounts of data distributed across the chip: the data must be brought to the core where the thread is running, incurring delays and energy costs. In this paper, we propose an appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 50  شماره 

صفحات  -

تاریخ انتشار 2001